Quasispecies theory for finite populations 



Jeong-Man Park 1 ' 2 , Enrique Munoz 1 , and Michael W. Deem 1 

1 Department of Physics & Astronomy, Rice University, Houston, Texas 77005-1892, USA 
2 Department of Physics, The Catholic University of Korea, Bucheon 420-743, Korea 

We present stochastic, finite-population formulations of the Crow-Kimura and Eigen models of 
quasispecies theory, for fitness functions that depend in an arbitrary way on the number of mutations 
from the wild type. We include back mutations in our description. We show that the fluctuation 
of the population numbers about the average values are exceedingly large in these physical models 
of evolution. We further show that horizontal gene transfer reduces by orders of magnitude the 
fluctuations in the population numbers and reduces the accumulation of deleterious mutations in 
the finite population due to Muller's ratchet. Indeed the population sizes needed to converge to the 
infinite population limit are often larger than those found in nature for smooth fitness functions in 
the absence of horizontal gene transfer. These analytical results are derived for the steady-state by 
means of a field-theoretic representation. Numerical results are presented that indicate horizontal 
gene transfer speeds up the dynamics of evolution as well. 
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INTRODUCTION 



Biological populations in nature are finite. In partic- 
ular, it is clear that the number of individuals in a pop- 
ulation is much smaller than the number of possible ge- 
netic sequences, even for genomes of modest length. For 
example, the largest populations observed in biological 
systems, RNA viruses, are on the order of N = 10 12 vi- 
ral particles within a single infected organism [l[. These 
viruses possess a relatively short genome of length L ~ 
10 3 — 10 4 bases and hence the theoretical size of the 
sequence space is 4 L ~ 10 6000 ^> N. Even the region 
of phase space for which fitness is high is typically much 
larger than the biological population size. From this ex- 
ample, it is clear that no real biological population will 
be able to sample the entire sequence space during evo- 
lutionary dynamics Q, and therefore finite population 
size effects may be important for a realistic description 
of evolution [3|. Finite populations with asexual repro- 
duction are subject to the "Muller's ratchet" effect 
which is the tendency to accumulate deleterious muta- 
tions in finite populations UQ. It has been suggested 
that horizontal gene transfer and recombination may pro- 
vide a way to escape Muller's ratchet in small populations 
an d this mechanism has been proposed as one of 
the evolutionary advantages of sex, despite the additional 
mutational load for fitness functions with positive cpista- 
sis 0-0 [UHlIl • The role of the finite population size in 
the Muller's ratchet effect has been previously studied by 
the traveling- wave approximation 3, 1(| . This theoret- 
ical approach introduces an approximate treatment, by 
assuming deterministic dynamics for the bulk of the pop- 
ulation, but stochastic dynamics for the edge composed 
of the class of highest fitness genotypes. The determinis- 
tic component of this theory, which considers single point 
mutations coupled to replication, is similar to traditional 



quasispecies models for infinite populations. These pre- 
vious studies considered only linear fitness functions and 
analyzed in detail the case of no back mutations, an ap- 
proximation which changes the dynamics and leads to a 
different stead y-st ate distribution. An exception is the 
model in Ref. |33[, which presents a mean- field approx- 
imation which incorporates single back mutations in a 
linear fitness. 



We here include back mutations and consider fitness 
functions that depend in an arbitrary way on the number 
of mutations from the wild type in our exact description. 



repre- 
17 1 and the Eigen 



Quasispecies models for molecular evolution, 
sented by the Crow-Kimura model 
model [18l421| . are traditionally formulated in the lan- 
guage of chemical kinetics. That is, they describe 
the basic processes of mutation and selection in an 
infinite population of sclf-rcplicating, information en- 
coding molecules such as RNA or DNA, which are 
assumed to be drawn from a binary alphabet (e.g. 
purines/pyrimidines). These models exhibit a phase 
transition in the infinite genome limit flil - 1261 ]. separat- 
ing an organized or quasispecies phase from a disordered 
phase. This phase transition occurs when the mutation 
rate exceeds a critical value, which depends on the na- 
ture of the fitness function 25|, |27| • The phase transition 
is usually of first order for binary alphabets [25, 27 1, but 
it is of higher order for smooth fitness functions in larger 
alphabets [28| . The quasispecies is composed by a collec- 
tion of nearly neutral mutants, that is, a cloud of closely 
related individuals sharing similar fitness values, rather 
than by a single sequence type. Despite its abstract char- 
acter, the quasispecies model has been successfully ap- 
plied to interpret experimental studies in RNA viruses 

Mil. 
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FIG. 1: Fluctuation in the number of individuals with a given 
sequence composition. The quadratic fitness is used in the 
parallel model, with L — 200 and k — 4.0. The theory is 
obtained from Eqs. and (|12|) . Fluctuations decrease by 
orders of magnitude with increasing horizontal gene transfer 
rate, v. 



FIG. 3: The average composition as a function of time, aver- 
aged over 50 independent Gillespie simulations, with popula- 
tion size N = 10 4 (solid curves). Also shown are one standard 
deviation envelopes ±0"(i) (dotted curves). The steady-state 
averages (u) ± \J ((Su) 2 } are displayed as solid lines for refer- 
ence. 
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FIG. 2: Stochastic results obtained by averaging over 50 
independent Gillespie simulations, are shown and compared 
with analytical theory, for v = 7.0. 



FINITE POPULATION EFFECTS IN THE 
CROW-KIMURA MODEL 

In the infinite population limit, the mean field ap- 
proach that is customary in chemical kinetics is justified, 
and the evolution of the probability distribution of se- 
quence types can be described by a deterministic system 
of differential equations. This mean field approach can- 
not capture the fluctuations in the numbers of individuals 
with different sequences, which are a consequence of the 
stochastic dynamics of the process. An accurate descrip- 
tion of all aspects of a finite population therefore requires 
a master equation formulation We here consider ar- 
bitrary fitness functions. The special case of linear fitness 
functions /(£) = a£, have been analyzed in [HJ 16, 33|. 

We consider a finite population, composed of N < oo 
binary purine/pyrimidine sequences, of length L. The 
terms in the master equation for the Crow-Kimura, or 



parallel, model are i) a replication term, whereby each 
individual of sequence Si reproduces at a rate Lf(Si) 
and the offspring replaces a random member of the pop- 
ulation, ii) a mutation term, whereby each base in a se- 
quence mutates at a rate per unit time, and iii) a hor- 
izontal gene transfer term, whereby bases in a sequence 
are replaced at rate v per unit time with bases randomly 
chosen from the population. We assume that the repli- 
cation rate, or microscopic fitness, is a function of the 
Hamming distance from the wild-type genome, and hence 
of the one-dimensional coordinate < £ < L represent- 
ing the alignment of an individual's sequence with the 
wild type. The master equation can be exactly projected 
onto the £ coordinate and defines the rates at which the 
sequences of individuals change with time due to replica- 
tion, mutation, and horizontal gene transfer. We define 
(1 + u)/2 to be the probability of a wild type letter in 
the sequence, p± — (1 ±it)/2 is the probability of insert- 
ing a wil d- type or non- wild-type letter by horizontal gene 
transfer 27], |34[, and 



(1) 



is the 'average base composition,' where is the number 
of individuals at coordinate £. 

We formulate the master equation for the probabil- 
ity distribution P({n^};t), as a function of the set of 
occupation numbers {n^}o<(<L- As in the classical, in- 
finite population Crow-Kimura model [l7j |. we consider 
point mutation with rate /x, and replication with a rate 
r(£) = Lf(£), while preserving the population size N. In 
addition, we consider horizontal gene transfer of single 
letters between an individual sequence and the popula- 
tion, with rate v. 
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The master equation describing this process is 



d_ 

dt 



x P(n^ - 1, n,£< + 1) - n^'P({n^})] 

L 

+ /i5^[(L-0(n c + l)PK + l,n f+ i-l) 
+ £(n £ + l)P(n ? _i - 1, 7i£ + 1) - Ln£ 

L 

x ^})] + ^[ P+ (L-OK + l) 

x P(nt + l,nz + i-l)+fr-(nt + i) 

x P(n ? _i-l,n ? + l)-n f {p + (i-C) 

+ p-0^(K»] (2) 

Note that this exact master equation includes 'back 
mutations' often ignored in the literature [l5, 16 1. Note 
that the approximation of setting back mutations to zero 
leads to both different dynamics and a different steady- 
state. 



Mapping to a field theory 

We seek analytical expressions for the fluctuations in 
number of individuals with given sequence compositions 
in the finite population parallel model. We derive these 
results by means of a field-theoretic method [25, 35, 3(§. 
This approach provides a system of coupled differential 
equations for the probability distribution and the fluc- 
tuation of numbers of individuals with given sequence 
composition, whose computational solution is essentially 
instantaneous. These results give us the fluctuation and 
correlation in population numbers and are an exact ex- 
pansion in the inverse of the population size. We in- 
troduce an exact representation of the classical master 
equation in terms of a many-body quantum theory [25j j. 
For that purpose, we define the population state vector 



!*(*)> = E p (W;*)iK}> 



(3) 



with 



IOh}) = K>«i>- ••>«£> = IJ®K) (4) 

This population state vector evolves according to a 
Schrodingcr equation in imaginary time, 



-|*(t)> = -#!*(*)> 

which possesses the formal solution 
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FIG. 4: (Color online) Finite population versus infinite pop- 
ulation results for the probability distribution of the parallel 
model with quadratic fitness. Note that the Muller's ratchet 
phenomenon, whereby fitness is reduced for finite populations, 
is greatly suppressed for v > 0. Here k = 4 and L — 200, and 
the stochastic results are obtained by averaging over 50 inde- 
pendent numerical experiments. 



with |\l/(0)) = representing the initial configura- 

tion of the population. The master equation is writ- 
ten in second quantized form, with a Hamiltonian ex- 
pressed in terms of boson creation and destruction oper- 
ators [&£ , ] = £/ , whose action over the occupation 

number vectors is defined by a^\n^} = n^\n^ — 1), and 
al\n^} = |ri£ + 1). The Hamiltonian is given by 



1 L 



N 

U'=o 

L 
L 

- 4) a «] 



4)% +£(4-i ~ a 

£+ i - 4) a e + p 



(7) 



The terms proportional to / represent replication, /i rep- 
resent mutation, and v represent horizontal gene trans- 
fer. The population average of a (normal-ordered) clas- 
sical observable, represented by the operator F{ {a,f\ ), is 
obtained by the inner product with the "sum" [351 ] bra 



(nt=o^) ; 



(F) = (-\F{{a s })Mt)) = (■\F({&s})e- m \{4}) (8) 

A Trotter factorization is introduced for the evolution 
operator e~ m in a basis of coherent states, defined as 
a^z^) = Z(\z£). This procedure leads to a path integral 



representation [25. 27, 2 



(F) = J [Dz*Vz}F({z^t/e)}) ( 



S[{z*},{z}] 



(9) 
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Here z are the coherent state field of the second quantized 
theory of the parallel model, and S is the corresponding 
action. The action in the exponent of Eq. ^ is given, 



after the change of variables z* 
time by 



1 + z, in continuous 



J 



S[{z},{z}} =J2j a *'| « c (t')« c (t)-nc Ml + **(*')] 6{t) + z^ - fj,[{L - $z i+1 + {z^ - Lz $ ]z e 



- Op+h+i + Zp-h-i - {{l - 0p+ + £p-}hW + - ) 10 ) 

£'=0 
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In the limit of a large population, we look for a saddle- 
point in the action Eq. (TTU)) . From the condition 



ss 



dition 



0, we obtain 



0. From the con- 



as 



<5z £ (t) 



= 0, we find the saddle-point solution 

z|(i) = NP{(t), where P^ satisfies the differential equa- 
tion for infinite population quasispecies theory, general- 
ized to include horizontal gene transfer [27, 34 1: 



The fluctuations in the number of individuals with a 
given sequence composition are obtained from the rela- 
tion 



(5n £ ) 



(15) 



Continuous and discontinuous fitness functions 



+ v\p+{L - £ + l)P e _! + p_(£ + l)P e+ i - {(L 

L 
£'=0 

Details are given in Appendix 1. 



Fluctuations 

To calculate fluctuations, we expand the action up 
to second order, to obtain the correlation matrix 
(Sz^(t)8z^'(t)) = C^^'(i), which in continuous time 
evolves according to the Lyapunov equation 



dt 



C = AC + CA 1 



D 



subject to the initial condition g (0) = — n^S^ g 
the matrices A and B are defined by 



(12) 
Here, 



[A] 



J £-i,«' 



(L-£ + l)[p + i/p+]+^ [£/(£) 



-J2Lf(^)P 6l -H(L-Op+ + ^P-}-LlA 
£i 

= 6tf2Lf(Z)NP 6 - L[f(i) + m')]NP^ (14) 
See Appendix 2 for details in the derivation. 



We consider two example fitness functions, which ex- 
hibit a quasi-species phase transition in the infinite 
genome length limit L — > oo. The sharp peak represents 
the extreme case of the wild type sequence replicating at 
a high rate, and all other sequences replicating at a single 
lower rate. The sharp peak fitness function represents a 
very strong selective advantage for the wild type. For the 
sharp peak /(£) = AS^l, from Eq. (fTT]) and large L, we 
find that the wild-type probability 



— P L ~LAP L {1 
at 



P L )-L{ii + v P -)P L 



(16) 



At steady-state, taking into account that u = 1 — 0(L~ l ) 
for the sharp peak, we have p_ = (1 — u)/2 = 0(i _1 ), 
and after Eq. (fil)f we find 



o, 

1 - fi/A + 0(L- v ), 



i>l 

4 <i 



(17) 



Notice that the steady-state distribution is not affected 
by horizontal gene transfer (y > 0). To obtain the fluc- 
tuations in the probability distribution, we consider Eq. 
(fT2"|) for the matrix element Cl,l- The terms Cl,l±i are 
0(L~ l ). We also notice that E^oQi.i = ~ NP L, to 
find that the stationary solution of Eq. (|12|) is given by 

= LANP L (l-P L )-nLC LlL -vp-LC L , L 



LAN P[ 



LA{\ - P L )C hih 
ANP L (1-2P L ) + [(A- n 



- LAP L C L . L 
vpJ) - 2AP L ]C L , L 
(18) 
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From Eq. (|18j) . we have A — fx — vp^ = APl, and substi- 
tuting into Eq. (|T8)) we obtain 

Clx = N(l - 2P L ) (19) 

Substitution of this result into Eq. ([To]) shows that the 
fluctuation is given by 



((6n i=L y-)/N* 



0, 



fi/A > 1 



H/{NA), (if A < 1 



(20) 



a result first given in Ref. [37[ by a different method. 

The second fitness function we consider is one for which 
the replication rate decreases continuously as a function 
of the Hamming distance from the wild type. In partic- 
ular, we choose a quadratic fitness /(£) = (fc/2)(2£/L — 
l) 2 . The quadratic fitness represents any continuous fit- 
ness function, for which mutants reproduce more slowly 
than the wild type, in a way that depends continuously 
on the Hamming distance from the wild type. Figure [T] 
shows that horizontal gene transfer reduces by orders of 
magnitude the fluctuations in number of individuals with 
a given sequence composition, ri£. Indeed, a small rate 
of horizontal gene transfer is enough to reduce by several 
orders of magnitude these fluctuations, as compared to 
the case without horizontal gene transfer, v = 0. 

The linear fitness function /(£) = A£/L was considered 
in [33| and in ljj 16 1 in the absence of back mutations. 
The steady-state exhibits no phase transition for the lin- 
ear fitness. We skip this example in favor of the forms 
considered above. 



Stochastic simulations 

We performed Lebowitz/Gillespie simulations (38l. |39| 
in which we explicitly simulate a population of size -/V un- 
dergoing the stochastic processes of mutation, horizontal 
gene transfer, and replication. In Fig. [5] and Fig. [31 we 
compare our theory with stochastic simulations, at differ- 
ent rates of horizontal gene transfer. The results obtained 
from stochastic simulations converge toward the theo- 
retical value calculated from Eqs. (fTTj) and (|T2|) as the 
size of the population, N, increases. Non-zero horizontal 
gene transfer rates both reduce fluctuations and acceler- 
ate convergence towards the infinite-population value of 
the mean fitness. 

In Fig. 21 the steady-state probability distribution ob- 
tained from the numerical solution of Eq. ([lip is com- 
pared with the distributions obtained from stochastic 
simulations, for different sizes, N, of the population. The 
convergence with TV toward the infinite-population limit 
is more rapid for non-zero v. Indeed for smooth fitness 
functions, the infinite population limit is only reached for 
population sizes larger than those commonly found in na- 
ture. For the discontinuous sharp peak fitness function, 
on the other hand, fluctuations are small, Eq. (|20[) . and 
the convergence to the infinite population limit is rapid. 
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FIG. 5: Fluctuations in the probability distribution for the 
Crow-Kimura model, obtained from stochastic simulations 
using the Gillespie method (dots and diamonds) at differ- 
ent sizes of the population, in the absence of horizontal gene 
transfer v = 0. Convergence towards the theoretical curve Eq. 
(I12[) (solid line) is observed. Here L = 200, and the quadratic 
fitness with k — 4.0 and fi = 1 was considered. 



Another point from Fig.[3jis that horizontal gene trans- 
fer speeds up the rate of evolution. We see that the con- 
vergence to the steady state is more rapid for increased 
horizontal gene transfer rates. Numerical experiments 
have shown that the effect of horizontal gene transfer on 
the rate of evolution is especially dramatic for rugged 
fitness landscapes 0, 41 1. At the local scale, biolog- 
ical fitness landscapes may be relatively smooth. At 
larger genetic distances, however, we expect biological 
fitness landscapes to be rugged. Correlations exists in 
the rugged landscape, and horizontal gene transfer cou- 
ples to those correlations in a way that allows evolution to 
speed up dramatically [42(. We expect that this speedup 
of evolution on rugged landscaped is one of the most sig- 
nificant effects of horizontal gene transfer in biology. 

Note that when v = the number fluctuations for the 
case of fitness functions for which the population is not 
exponentially localized at £ = L (i.e. continuous fitness 
functions) are large in comparison to the fluctuations for 
a localized population, e.g. sharp peak. Another way to 
see this effect is shown in Fig. [5l where for v = 0, the 
convergence to N —> 00 is slow. 

As a final remark, we tested the validity of the de- 
scription of the stochastic process in the language of 
Hamming distance classes, as used in our theory. For 
that purpose, we performed numerical experiments with 
Lebowitz- Gillespie simulations with both a finite popula- 
tion of explicit sequences [27[ , and the analogous system 
in the representation of Hamming distance classes. As 
expected from a simple argument based on permutation 
invariance of the fitness function that shows the stochas- 
tic class dynamics is an exact projection of the stochastic 
sequence dynamics, both descriptions yield exactly the 
same statistics, as shown in Fig. [51 
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FIG. 6: Probability distributions for the Crow-Kimura model, 
obtained from stochastic simulations using the Gillespie 
method with explicit sequences or alternatively with Ham- 
ming distance classes. Clearly both descriptions are statis- 
tically identical. Here L — 200, and the quadratic fitness is 
used with k — 4.0, (jl = 1, and for a population of N = 10 9 
individuals. 



THE EIGEN MODEL 

We now turn to the Eigen model. In contrast to the 
parallel model, mutation and horizontal gene transfer are 
assumed to occur only during replication in the Eigen 
model. That is, multiple mutations occur along each 
sequence as a consequence of errors in the replication 
process, and during this process horizontal gene trans- 
fer with probability u/L per letter can also occur. The 
transfer matrix for mutations from class £ into class £ is 
denoted by Q ^ g j2f| , 



(21) 



Here, q ~ 1 characterizes the fidelity in the replication 
process, when 1 — q is the probability (per site) that 
an incorrect letter is placed by the polymerase enzyme. 
Note that 'back mutations', often ignored in the litera- 
ture, are included in the Eigen model. There is also ran- 
dom degradation of individuals with rate Ld. We again 
seek to calculate shifts in the average population distribu- 
tion as well as fluctuations about the average for a finite 
population of individuals following the dynamics of the 
Eigen model master equation. Here, terms proportional 
to (1— u/L) represents the evolutionary processes of repli- 
cation and multiple mutations in the absence of horizon- 
tal gene transfer. On the other hand, the terms propor- 



tional to v/L represent the coupled sequential processes 
of replication, horizontal gene transfer and multiple mu- 
tations. We also consider the possibility of degradation 
through terms proportional to the degradation rate d(£). 
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Mapping to a field theory 

By the same method as in the parallel model, we map 
the master equation into a second quantized formulation, 
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with Hamiltonian 



H- T )(L/N) Yl 1 

£,f',£"=o 



With a similar method as in the parallel model, we in- 
troduce coherent states in a Trotter factorization of the 
) fiO^H^ ) evolution operator, as defined in Eq. (JHJ. From this pro- 

^ 4' £" cedure, we derive the field theory for the Eigen model as 
well. In this case, the action given by 



a^„+(L/N) £ d(f')6j(6j-at ) 
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In the limit of a large population, we look for a 
saddle-point in the action Eq. (IM1) . From the condition 
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0, we obtain z^(t) = 0. From the second equa- 
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the differential equation 



0, we find that P e (t) = z?(t)/N satisfies 
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FIG. 7: Fluctuations in the probability distribution, as pre- 
dicted from our theory Eqs. (|25H26|I . for the Eigen model and 
quadratic fitness, at different horizontal gene transfer rates, v. 
Here L — 200, k — 4.0, and \x — 1. Fluctuations decrease by 
orders of magnitude with increasing horizontal gene transfer 
rate. 



and the initial condition corresponds to P^(0) = n®/N, 
as derived in Appendix 3. This is exactly the differential 
equation for P%(t) from infinite population quasispccies 
theory fHHl. 

By expanding the action Eq. (|M|) up to second or- 
der to calculate the matrix of correlations, as shown in Appendix 4, we obtain in the continuous time limit the 
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Lyapunov Eq. p2[) . with matrices A defined by 
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(27) 



Continuous and discontinuous fitness functions 

For the sharp peak /(£) = (A — Ao)df,L + ^4o ; for the 
Eigen model in the absence of horizontal gene transfer 
(v = 0), we obtain that the wild type probability is 

L 

- P L [AP L 

e'=o 

+ A J2 P^] = 0(28) 
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FIG. 8: Probability distributions, as predicted from our the- 
ory, for the Eigen model and quadratic fitness, at different 
recombination rates. Here L — 200, k — 4.0, and /j, — 1 . 



Since q ~ 1, (the fidelity in the replication process is very 
high), then 1-gCl and Eq. (f28|) becomes. 



q L AP L - P L [(A - A Q )P L + A ] = 



(29) 



By defining q L = e M , we obtain for the probability of 
the wild-type 



Pf=i 



0, 



A < eM 



( e -»A-A )/{A-A ), A>e»A 



(30) 



For the correlation matrix, we define g = jfC^ ^> , and 
find that the stationary solution for Z?l,l in the absence 
of degradation = is given by 

1 L 

= jjB L , L + J2 [Al.^D^.l + A L , (l Ds uL ] (31) 



Si=o 



From this equation, we find Al,^ 1 D^ 1 ^l = —^JjBlx- 
Hence, expanding the left hand side explicitly, we find 

L r L 

6=0 L {"=0 



(E^) p fi)^-/&) Pi 



= -[Q L ,Lf(L)P L -f(L)Pl] 



D 



(32) 



Expanding this equation when L is large and q ~ 1, we 
find 

[<7 L A - (A - A )Pl - A - (A - Ao)P L ]D LtL 

= AP L (P L - q L ) + q L APl - A Pl (33) 



Substituting the result P L 
find 



q^A-Ap 

A — An 



from Eq. (|3"0]l. we 



1 



L,L 



{a - A y 



[AA -At- {q A) + q AAq] (34) 
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The fluctuation in the number of individuals with the 
wild- type sequence is obtained from Eq. (|15[) . 



N 2 



0, 



A < eM 
A > e»A 



(35) 



For smooth fitness functions, there are large fluctua- 
tions in the population numbers in the absence of hori- 
zontal gene transfer. In Fig. [7] we present the fluctuations 
in the number of individuals with a given sequence for 
the quadratic fitness, as predicted from our theory Eqs. 
(f2"5Tl2"T)) . A moderate horizontal gene transfer rate re- 
duces by orders of magnitude the fluctuations. In Fig. 
|8] inset, we present the equilibrium probability distri- 
butions, for different rates of horizontal gene transfer, 
as obtained from our theory for the quadratic fitness 
/(£) = (fc/2)(2£/L - l) 2 /2 + 1. For this fitness function 
with negative epistasis, horizontal gene transfer reduces 
the mean fitness in the infinite population limit 27 1. 



CONCLUSION 

For both the parallel and Eigcn models, we have found 
that horizontal gene transfer reduces by orders of magni- 
tude the fluctuations in the number of individuals with 
a given sequence composition for smooth fitness func- 
tions, such as quadratic. Horizontal gene transfer also 
reduces the variability within and between independent 
experiments for smooth fitness functions. Finally, hori- 
zontal gene transfer substantially reduces the "Mullcr's 
ratchet" phenomenon, whereby fitness is reduced in finite 
populations relative to the infinite population limit. For 
the sharp peak fitness, horizontal gene transfer does not 
modify the steady-state distribution of fluctuations. 

The reduction in finite populations by horizontal gene 
transfer of both the magnitude of the Mullcr's ratchet 
phenomenon and the fluctuations in population 

numbers should be observable in experiments. The fluc- 
tuation in population numbers can be measured cither 
at different time points in long experiments or as fluc- 
tuations between different experimental replicates. The 
latter is likely to be more feasible in the laboratory. 
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action Eq. (fT0|) in the coherent fields z^(t) and z%(t). The 
first condition is 

+ £3£-i(t) - Lz e (t)} - v[(L - t)p+k+i(t) 

+ ep-«c-i-{(i-f)p+ + fp-}2e(*)] 

L L 

- ££/(&)(! +%(*))[%(*) 

fi=0f 2 =0 

- %W](%,^ 2 W + ^W%,?)=0 (36) 



where T is the final integration time in Eq. (|I0[) . which 
we typically set as T = oo. The solution which satisfies 
this saddle-point condition is = 0, for < t < T. 

The saddle point condition in the fields z^(t) is 



mo) - T #L^) + % - 



ss 

5z 6 (t) ~ l + z 5 (0) J " w ' dt 

x Zc _i(t) + (£ + l)z i+1 (t) - Lz^t)} - v\(L - i 

+ l)p+«c_i(t) + (£ + l)p-« f +i(*) - i( L - 0P+ 

L L 

+ to-}*®] -j^EE £/(£iH%,f[%(*) 

6 =0£ 2 =0 

- % (*)] + (! + %)[%,€ ~ <W]}% (*)*&(*) = 

(37) 

In combination with the solution z|(t) = obtained from 
Eq. (|36)) . Eq. (|37|) provides the differential equation for 
the probability distribution P$(t) = z^(t)/N, 



-P £ - n[(L-£ + 1)P £ - X + (£ + l)P c+1 -ZP c ] 

+ ^[p+(L-e + f)p c -i + p-(e + i)p ?+ i 

L 

- {{L - OP+ + tP-}P 6 ] + [r(f) - J2 r(?) p t'] p t 

f'=o 

(38) 



and the initial condition Pf(0) = n®/N. In deriving Eq. 

([38} from Eq. ([37]), the property E^o-^W = 1 was 
used, and we introduce the notation r(£) = Lf(£). 



APPENDIX 2 



APPENDIX 1 

We present the derivation of the saddle point equations 
for the Kimura model. We look for a saddle point of the 



We next consider the expansion of the action Eq. (flOj) 
near the saddle-point S c . For convenience, we define a 
discrete time label k = t/e, with e — > 0. Fluctuations 
near the saddle-point solution are given by Sz^(k) = 
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Zf(fc) — Zf(k), and <5z £ (fc) = z^{k) — zUk). This gh 



S-S c = 

e,e'=o 



Sz^{0)Sz^(0)S^ + -n £ £z £ (0)<5%(0) 

t/e , 

x 8^ e> + ^ < 5z^(k)8z^ (k)S^ ^ — e5z^(k)5z^' (k) 
k=l ^ 

x [5 u >r(t)NP e (k - 1) - riONP^k - 1) 

x P^(fc-l)]}+£^(fc)^(fc-l){ 
J fc=i ^ 



e/z[(L - £ + 1)^-1^ + (£ + - 
eu[(L - e + l)p+^_i )£ ' + (£ + l)P-^+i, e ' 
{(L - + £p-}<^] - e[{r(0 - ^ r(fc) 



x P £l (A - 1)}^ + (r(0 - r« - 1)] 

= ix 71 ^ 1 ^ + 0(X 3 ) 



(39) 



Here, we have defined the vector X T = 
({6z(0),5z(0)},...,{6z(t/e),6z(t/e)}). The matrix 
I! -1 is banded tri-diagonal, with 



\ 






n - 





n "o 


-no! 1 








-nro 1 














-n^ 1 


^22 


n 



-1 

23 



(40) 



Here, 



n™ 1 = ( o ) ' |A " > '«' = " ?a «' 



! , 7 + eA(fc - 1) 

- \ 



k -^ k \ I + eA T (k-l) 



The matrices A and B are defined by 



(41) 



[A] u , = 6^ (L - £ + 1)[ M + vp + ] + 5tf [Lf(0 
-J2Lf(Z 1 )P ei -v{(L-Op++tP-}-L»} 

+L[f(0 - f(t')]Pt + <W, £ ' (£ + l)[^ + (42) 
[P] £>£ < = 6^2Lf($)NPt - L[f(0 + f(£)]NP £ P s , (43) 



Here, A a symmetric matrix [A T (fc)]^ ^ = [A(fc)]^' ^. By 
standard matrix inversion, we obtain 



n(t/c) 



n- l (t/c-i) 



0. 



n t/M/e-l 



V n t/i-l,t/e / 
n t/M/e 



(44) 



Calculating the inverse in Eq. ([44|). we obtain 



n(*/ e ) 



= 6 



t/e, t/e 



't/e,t/e 



n 



t/e, t/e 



n(i/ e -i) 



00. 



-n 






t/e,t/e-l 



V n t/e-l,t/e / 



n 



t/e, t/e ~ n t/e,t/e-l°*/e-l,*/e-l 
1 



-1 



x n 



t/e-l,t/e 

From this recursive equation, we find 

-l 



(45) 



boo = 
hi = 



n 



I 

1 -N° 



I 

1 {I + eA(0)}[-N°]{I + eA T (0)} + eB(0) 

(46) 



From Eq. (|46|) , proceeding by induction, we prove that 
the matrices bk possess the structure 



bk,k = 



I 

1 G{k) 



and after the recursion relation 

bk,k 

we obtain 



n fc 1 - n fc,fc-A-l,fc-i n fe-i,fc 



(47) 



(48) 



C(k) = [I + eA(k-l)]C{k-l)[I + eA T {k-l)} 
+ eB(k - 1) 

C(0) = -N° (49) 
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In the continuous time limit e — >• 0, Eq. (|49j) becomes 
a Lyapunov equation 



— C = B + AC + CA 1 
at 

C(0) = -N a 



(50) 



with [NX ,, = 5 5J :>n £ . 



APPENDIX 3 

Now, we derive the saddle point equations for the Eigen 
model. We look for a saddle point of the action Eq. ([24]) 
in the coherent fields z^(t) and z^(t). The first condition 
is 



SS 
Sz e (t) 



x E 1 /KO + % (*)P&(*) 

«1,?2,?3=0 ^ 

" %(*)](% ,£%, (*) + "W^lC*)) 



E 



1 + % (*)] [% (t) - % (i)] f3 (*)% 1£ 
^W%,«)|=0 (51) 



L 



where T is the total integration time in Eq. (|24[) . which 
we typically set as T = oo. This saddle-point condition 
is satisfied by the solution z|(i) = 0, for < t < T. 
The saddle-point condition in the fields z^(t) is 



SS 



"■£ \ i 



%(0)/ W iV 



^l) E {^2,6/(6) w 
%(*)] + [i + %(*)]&,,£ - <W] W (*)*& (*) 



- E 



equation for the probability distribution P^(t) = z^(t)/N, 

L 

E^' r ^') p e'(*)- p sW 



|™-('-z 



£ =o 



x E^') F £'W 
£'=o 

x d(e') 1 



^(o-E p £'(*) 



£ =o 



E l^'+iP+^-C') 

£'=0 



L 

x ^K(l-cVp4>((')^(*) 

£'=0 



(53) 



and the initial condition P^(0) — tk/N. In deriving Eq. 
(|53"|) from Eq. (j5"2")l . we used the properties: X^=o ^*£ = 
and X)£=oQw = L 



APPENDIX 4 



Now, let us consider the expansion of the action Eq. 
(f2"4"| for the Eigen model near the saddle point, with fluc- 
tuations near the saddle-point solution given by Sz^(k) = 
z^{k) — z^(k), and 8z^(k) = z^(k) — z|(fc). 



S-Sc = E 

£=0 
t/e 



6zz(0)6zt{0) + -npz^Sz^O) 



fc=i 



t/e 



fc=l L 



£i,£2,£s 
Q£2,£i-iP-6]/(Ci) 



+ [i + %(«)]fe,rfe,?) 



,(^&W =0 (52) 



In combination with the solution z|(t) = obtained 
from Eq. (jBTj) . after Eq. (|5"2"j) we obtain the differential 



y £,£',£" 
x [Sz^N 2 P^P e > 

+ NPz5z£»{k-l)+NP?,5zt(k-l)] 

+ E d (0[<^(fc) - ^(fcji^cfcjj^p^ 

£,£' 

+ Arp e fc£,(fc-i) + jvP£/&z £ (fc-i)] 
- E W£'.£ + iP+( L -0 + Q£',£- 1 

fc=l £,£',£" 

x p^}r(0[Sz^(k) -5z^(k)][5z s (k)N 2 P s P^ 

+ NP^Sz^ (k - 1) 

+ NP^Sz^k-l)} + 0[(Sz,Sz) 3 } 



X 1 IL^X + 0(X d ) 



(54) 
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Here, we defined X T = 

({6z(0),Sz(0)},...,{6z(t/e),6z(t/e)}). The matrix 
n _1 is tridiagonal by blocks, as in the case of the 
parallel model. A similar analysis holds for the Eigen 
model as well, with matrices A and B defined as 



E Qe,e"ttt")Ps" 
e"=o 



- s w £/(£>«»-/(£>« 

?"=o 

r L 



+6 



E d(6)Pft - d(0 
Si=o 



r i 



[d« ) - d(0]Pe 

£ (Vr ;/' f' 

;"=o v 

+Q«"+i/>+(W) 1/(0^- + (q^'-ip-C' 

+Q f ,c +1 p+(i-$')V(e')-« M ' £ (mW) 



£"=0 



/(e )p { «- p+(£-e)+p-$ /(m 



(55) 



L" 1 [B]^=^(1- I 



+ 2( E^^jn^r 



(/(f) + /(£ ))^' 



x/(0 



P^-(d(0+d(O)i^iV 



(56) 



A recursion relation identical to Eq. (|50|) is obtained, 
which in the continuous time limit e — >• yields a Lya- 
punov equation for the matrix C, 



—C = B + AC + CA T 
at 

with initial condition C, ,< = — <5 C ,< 



(57) 
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